134 research outputs found

    Automated Map Reading: Image Based Localisation in 2-D Maps Using Binary Semantic Descriptors

    Get PDF
    We describe a novel approach to image based localisation in urban environments using semantic matching between images and a 2-D map. It contrasts with the vast majority of existing approaches which use image to image database matching. We use highly compact binary descriptors to represent semantic features at locations, significantly increasing scalability compared with existing methods and having the potential for greater invariance to variable imaging conditions. The approach is also more akin to human map reading, making it more suited to human-system interaction. The binary descriptors indicate the presence or not of semantic features relating to buildings and road junctions in discrete viewing directions. We use CNN classifiers to detect the features in images and match descriptor estimates with a database of location tagged descriptors derived from the 2-D map. In isolation, the descriptors are not sufficiently discriminative, but when concatenated sequentially along a route, their combination becomes highly distinctive and allows localisation even when using non-perfect classifiers. Performance is further improved by taking into account left or right turns over a route. Experimental results obtained using Google StreetView and OpenStreetMap data show that the approach has considerable potential, achieving localisation accuracy of around 85% using routes corresponding to approximately 200 meters.Comment: 8 pages, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems 201

    Predicting Out-of-View Feature Points for Model-Based Camera Pose Estimation

    Get PDF
    In this work we present a novel framework that uses deep learning to predict object feature points that are out-of-view in the input image. This system was developed with the application of model-based tracking in mind, particularly in the case of autonomous inspection robots, where only partial views of the object are available. Out-of-view prediction is enabled by applying scaling to the feature point labels during network training. This is combined with a recurrent neural network architecture designed to provide the final prediction layers with rich feature information from across the spatial extent of the input image. To show the versatility of these out-of-view predictions, we describe how to integrate them in both a particle filter tracker and an optimisation based tracker. To evaluate our work we compared our framework with one that predicts only points inside the image. We show that as the amount of the object in view decreases, being able to predict outside the image bounds adds robustness to the final pose estimation.Comment: Submitted to IROS 201

    The multiresolution Fourier transform : a general purpose tool for image analysis

    Get PDF
    The extraction of meaningful features from an image forms an important area of image analysis. It enables the task of understanding visual information to be implemented in a coherent and well defined manner. However, although many of the traditional approaches to feature extraction have proved to be successful in specific areas, recent work has suggested that they do not provide sufficient generality when dealing with complex analysis problems such as those presented by natural images. This thesis considers the problem of deriving an image description which could form the basis of a more general approach to feature extraction. It is argued that an essential property of such a description is that it should have locality in both the spatial domain and in some classification space over a range of scales. Using the 2-d Fourier domain as a classification space, a number of image transforms that might provide the required description are investigated. These include combined representations such as a 2-d version of the short-time Fourier transform (STFT), and multiscale or pyramid representations such as the wavelet transform. However, it is shown that these are limited in their ability to provide sufficient locality in both domains and as such do not fulfill the requirement for generality. To overcome this limitation, an alternative approach is proposed in the form of the multiresolution Fourier transform (MFT). This has a hierarchical structure in which the outermost levels are the image and its discrete Fourier transform (DFT), whilst the intermediate levels are combined representations in space and spatial frequency. These levels are defined to be optimal in terms of locality and their resolution is such that within the transform as a whole there is a uniform variation in resolution between the spatial domain and the spatial frequency domain. This ensures that locality is provided in both domains over a range of scales. The MFT is also invertible and amenable to efficient computation via familiar signal processing techniques. Examples and experiments illustrating its properties are presented. The problem of extracting local image features such as lines and edges is then considered. A multiresolution image model based on these features is defined and it is shown that the MET provides an effective tool for estimating its parameters.. The model is also suitable for representing curves and a curve extraction algorithm is described. The results presented for synthetic and natural images compare favourably with existing methods. Furthermore, when coupled with the previous work in this area, they demonstrate that the MFT has the potential to provide a basis for the solution of general image analysis problems

    Absolute pose estimation using multiple forms of correspondences from RGB-D frames

    Get PDF

    RGBD Relocalisation Using Pairwise Geometry and Concise Key Point Sets

    Get PDF

    Dual-Domain Image Synthesis using Segmentation-Guided GAN

    Get PDF

    iDF-SLAM: End-to-End RGB-D SLAM with Neural Implicit Mapping and Deep Feature Tracking

    Full text link
    We propose a novel end-to-end RGB-D SLAM, iDF-SLAM, which adopts a feature-based deep neural tracker as the front-end and a NeRF-style neural implicit mapper as the back-end. The neural implicit mapper is trained on-the-fly, while though the neural tracker is pretrained on the ScanNet dataset, it is also finetuned along with the training of the neural implicit mapper. Under such a design, our iDF-SLAM is capable of learning to use scene-specific features for camera tracking, thus enabling lifelong learning of the SLAM system. Both the training for the tracker and the mapper are self-supervised without introducing ground truth poses. We test the performance of our iDF-SLAM on the Replica and ScanNet datasets and compare the results to the two recent NeRF-based neural SLAM systems. The proposed iDF-SLAM demonstrates state-of-the-art results in terms of scene reconstruction and competitive performance in camera tracking.Comment: 7 pages, 6 figures, 3 table

    Dual-Domain Image Synthesis using Segmentation-Guided GAN

    Get PDF
    We introduce a segmentation-guided approach to synthesise images that integrate features from two distinct domains. Images synthesised by our dual-domain model belong to one domain within the semantic mask, and to another in the rest of the image - smoothly integrated. We build on the successes of few-shot StyleGAN and single-shot semantic segmentation to minimise the amount of training required in utilising two domains. The method combines a few-shot cross-domain StyleGAN with a latent optimiser to achieve images containing features of two distinct domains. We use a segmentation-guided perceptual loss, which compares both pixel-level and activations between domain-specific and dual-domain synthetic images. Results demonstrate qualitatively and quantitatively that our model is capable of synthesising dual-domain images on a variety of objects (faces, horses, cats, cars), domains (natural, caricature, sketches) and part-based masks (eyes, nose, mouth, hair, car bonnet). The code is publicly available at: https://github.com/denabazazian/Dual-Domain-Synthesis.Comment: CVPR2022 Workshops. 14 pages, 19 figure

    HDRFusion:HDR SLAM using a low-cost auto-exposure RGB-D sensor

    Get PDF
    We describe a new method for comparing frame appearance in a frame-to-model 3-D mapping and tracking system using an low dynamic range (LDR) RGB-D camera which is robust to brightness changes caused by auto exposure. It is based on a normalised radiance measure which is invariant to exposure changes and not only robustifies the tracking under changing lighting conditions, but also enables the following exposure compensation perform accurately to allow online building of high dynamic range (HDR) maps. The latter facilitates the frame-to-model tracking to minimise drift as well as better capturing light variation within the scene. Results from experiments with synthetic and real data demonstrate that the method provides both improved tracking and maps with far greater dynamic range of luminosity.Comment: 14 page
    • …
    corecore